#
# Oliver Soop oliversoop@gmail.com
# 12. August 2013
#
This script is for parsing URLs and Metainfo from  web pages. These web pages are stored in a database and this script takes the dumps of these database tables as input.
In this script there are following Python scripts:
	MetainfoParser.py - This script parses metainformation of HTML pages and RSS feeds using Tika 1.4 library. More exact description present in the corresponding file.
	URLParser.py - This script parses URLs from different texts (HTML pages and RSS feeds in this context). More exact description present in the corresponding file.
	URLAndMetainfoParsingScript.py - This script combines the previous ones and coordinates the loading of dump files and parsing the necessary information. More exact description present in the corresponding file.

Necessary libraries:
	Python BeautifulSoup library to fix incorrect HTML pages
	Tika-App-1.4 to extract metainfo
	Python v2.7

To run the script, change the dirToProcess in URLAndMetainfoParsingScript.py to point to location where the dumps are located at and run the Python script.